Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency graph #659

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

polinabinder1
Copy link
Collaborator

@polinabinder1 polinabinder1 commented Jan 24, 2025

This generates a dependency graph between the bionemo sub-packages. Additionally, this will check that the pyproject.toml files agree with what's in the source files. This will also parse the source files to make sure that dependencies are correct between the bionemo sub-packages.

Signed-off-by: Polina Binder <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Jan 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.75%. Comparing base (60a6dad) to head (d4c7b08).

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #659   +/-   ##
=======================================
  Coverage   86.75%   86.75%           
=======================================
  Files         118      118           
  Lines        7059     7059           
=======================================
  Hits         6124     6124           
  Misses        935      935           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

return pyproject_files


def parse_dependencies(pyproject_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could also parse tach.toml and possibly warn if the dependency graphs are different? The tach check actually enforces this separation during CI, so it's probably more accurate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a check in place to ensure that the tach.toml and pyproject.toml files are up-to-date and valid in terms of dependencies? For example, does PyPI installation and importing all subpackages automatically verify this?

We could implement regular checks or enforcement in the CI pipeline. If the above method isn't sufficient, we can create a script that parses the project.toml files of the main project and its subpackages, extracts the import paths used in Python scripts under the src directory of each subpackage, and verifies all imports starting with from bionemo.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dorota -- I suggest we constrain scope for now to just drawing the dependency graph.

I agree it's a good idea to do what you're describing, but it would increase the scope of this substantially and at the moment there's other stuff we gotta do :)

Copy link
Collaborator

@dorotat-nv dorotat-nv Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trvachov , @polinabinder1 , therefore, it would be good to ticket that in github issue and JIRA and add a warning note that this method does not ensure correctness of the dependency graph and that the proposed method in my comment or alternative tool should be added to complete this task.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code gets the tach.toml dependencies and checks that the code imports in the sub-packages are correct based on what is in the pyproject.toml and tach.toml files.

Copy link
Collaborator

@dorotat-nv dorotat-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure is pyproject.toml is the up to date source of dependency information anymore. I am not sure how it is maintained.

If no, I thin we should implement a script that parses dependency graph for subpackages from the scripts, ie parse the project.toml files of the main project and its subpackages, extract the import paths used in Python scripts under the src directory of each subpackage, and verifies all imports starting with from bionemo.

@trvachov
Copy link
Collaborator

To me this looks like a good start just need to document "how to run" script in github PR description. I don't necessary need this to do any more than in currently does ( @pstjohn @dorotat-nv , I suggest we constrain our review just to "graph drawing" rather than any sort of py file parsing + CI enforcement.

Copy link
Collaborator

@dorotat-nv dorotat-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this script be relocated under internal/scripts?

@polinabinder1 polinabinder1 force-pushed the polinabinder/package_dependencies branch 2 times, most recently from ccab862 to 94fd399 Compare January 28, 2025 22:00
@polinabinder1 polinabinder1 force-pushed the polinabinder/package_dependencies branch from 94fd399 to ad206c4 Compare January 28, 2025 22:01
Signed-off-by: Polina Binder <[email protected]>
@polinabinder1
Copy link
Collaborator Author

Could this script be relocated under internal/scripts?

Done!

Copy link
Collaborator

@pstjohn pstjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, the module was laid out nicely and you have a lot of clear, reusable functions.

I do think we need unit tests for these functions though. Copilot could likely do that pretty quickly. Ideally any public function / class / method should get a test for all its expected behavior and edge cases, but even just a basic test for each of these functions would be great

I also think we should add those resulting images to our documentation; along with a command of how to run this to regenerate them.

Unfortunately putting this in internal/scripts means it's harder to write tests for; we don't execute pytests in that subdirectory. Maybe this could live in bionemo-fw? Or we could add that directory to our pytest call. But I'm worried that if we don't exercise this script in CI, it will break quickly without us realizing it and we won't be able to use it in planning a version bump / release strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants